ieee robotic and automation letter
Gaussian Variational Inference with Non-Gaussian Factors for State Estimation: A UWB Localization Case Study
Stirling, Andrew, Lukashchuk, Mykola, Bagaev, Dmitry, Kouw, Wouter, Forbes, James R.
This letter extends the exactly sparse Gaussian variational inference (ESGVI) algorithm for state estimation in two complementary directions. First, ESGVI is generalized to operate on matrix Lie groups, enabling the estimation of states with orientation components while respecting the underlying group structure. Second, factors are introduced to accommodate heavy-tailed and skewed noise distributions, as commonly encountered in ultra-wideband (UWB) localization due to non-line-of-sight (NLOS) and multipath effects. Both extensions are shown to integrate naturally within the ESGVI framework while preserving its sparse and derivative-free structure. The proposed approach is validated in a UWB localization experiment with NLOS-rich measurements, demonstrating improved accuracy and comparable consistency. Finally, a Python implementation within a factor-graph-based estimation framework is made open-source (https://github.com/decargroup/gvi_ws) to support broader research use.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > Netherlands > North Brabant > Eindhoven (0.04)
Flow-Aided Flight Through Dynamic Clutters From Point To Motion
Xu, Bowen, Yan, Zexuan, Lu, Minghao, Fan, Xiyu, Luo, Yi, Lin, Youshen, Chen, Zhiqiang, Chen, Yeke, Qiao, Qiyuan, Lu, Peng
Challenges in traversing dynamic clutters lie mainly in the efficient perception of the environmental dynamics and the generation of evasive behaviors considering obstacle movement. Previous solutions have made progress in explicitly modeling the dynamic obstacle motion for avoidance, but this key dependency of decision-making is time-consuming and unreliable in highly dynamic scenarios with occlusions. On the contrary, without introducing object detection, tracking, and prediction, we empower the reinforcement learning (RL) with single LiDAR sensing to realize an autonomous flight system directly from point to motion. For exteroception, a depth sensing distance map achieving fixed-shape, low-resolution, and detail-safe is encoded from raw point clouds, and an environment change sensing point flow is adopted as motion features extracted from multi-frame observations. These two are integrated into a lightweight and easy-to-learn representation of complex dynamic environments. For action generation, the behavior of avoiding dynamic threats in advance is implicitly driven by the proposed change-aware sensing representation, where the policy optimization is indicated by the relative motion modulated distance field. With the deployment-friendly sensing simulation and dynamics model-free acceleration control, the proposed system shows a superior success rate and adaptability to alternatives, and the policy derived from the simulator can drive a real-world quadrotor with safe maneuvers.
ShelfAware: Real-Time Visual-Inertial Semantic Localization in Quasi-Static Environments with Low-Cost Sensors
Agrawal, Shivendra, Brawer, Jake, Naik, Ashutosh, Roncone, Alessandro, Hayes, Bradley
Many indoor workspaces are quasi-static: global layout is stable but local semantics change continually, producing repetitive geometry, dynamic clutter, and perceptual noise that defeat vision-based localization. We present ShelfAware, a semantic particle filter for robust global localization that treats scene semantics as statistical evidence over object categories rather than fixed landmarks. ShelfAware fuses a depth likelihood with a category-centric semantic similarity and uses a precomputed bank of semantic viewpoints to perform inverse semantic proposals inside MCL, yielding fast, targeted hypothesis generation on low-cost, vision-only hardware. Across 100 global-localization trials spanning four conditions (cart-mounted, wearable, dynamic obstacles, and sparse semantics) in a semantically dense, retail environment, ShelfAware achieves a 96% success rate (vs. 22% MCL and 10% AMCL) with a mean time-to-convergence of 1.91s, attains the lowest translational RMSE in all conditions, and maintains stable tracking in 80% of tested sequences, all while running in real time on a consumer laptop-class platform. By modeling semantics distributionally at the category level and leveraging inverse proposals, ShelfAware resolves geometric aliasing and semantic drift common to quasi-static domains. Because the method requires only vision sensors and VIO, it integrates as an infrastructure-free building block for mobile robots in warehouses, labs, and retail settings; as a representative application, it also supports the creation of assistive devices providing start-anytime, shared-control assistive navigation for people with visual impairments.
- Oceania > New Zealand (0.04)
- Oceania > Australia (0.04)
- North America > United States > New York > Monroe County > Rochester (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.90)
Super4DR: 4D Radar-centric Self-supervised Odometry and Gaussian-based Map Optimization
Li, Zhiheng, Wang, Weihua, Shen, Qiang, Zhao, Yichen, Fang, Zheng
Conventional SLAM systems using visual or LiDAR data often struggle in poor lighting and severe weather. Although 4D radar is suited for such environments, its sparse and noisy point clouds hinder accurate odometry estimation, while the radar maps suffer from obscure and incomplete structures. Thus, we propose Super4DR, a 4D radar-centric framework for learning-based odometry estimation and gaussian-based map optimization. First, we design a cluster-aware odometry network that incorporates object-level cues from the clustered radar points for inter-frame matching, alongside a hierarchical self-supervision mechanism to overcome outliers through spatio-temporal consistency, knowledge transfer, and feature contrast. Second, we propose using 3D gaussians as an intermediate representation, coupled with a radar-specific growth strategy, selective separation, and multi-view regularization, to recover blurry map areas and those undetected based on image texture. Experiments show that Super4DR achieves a 67% performance gain over prior self-supervised methods, nearly matches supervised odometry, and narrows the map quality disparity with LiDAR while enabling multi-modal image rendering.
- Europe > Netherlands > South Holland > Delft (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
Spatiotemporal Calibration and Ground Truth Estimation for High-Precision SLAM Benchmarking in Extended Reality
Shu, Zichao, Bei, Shitao, Li, Lijun, Chen, Zetao
Simultaneous localization and mapping (SLAM) plays a fundamental role in extended reality (XR) applications. As the standards for immersion in XR continue to increase, the demands for SLAM benchmarking have become more stringent. Trajectory accuracy is the key metric, and marker-based optical motion capture (MoCap) systems are widely used to generate ground truth (GT) because of their drift-free and relatively accurate measurements. However, the precision of MoCap-based GT is limited by two factors: the spatiotemporal calibration with the device under test (DUT) and the inherent jitter in the MoCap measurements. These limitations hinder accurate SLAM benchmarking, particularly for key metrics like rotation error and inter-frame jitter, which are critical for immersive XR experiences. This paper presents a novel continuous-time maximum likelihood estimator to address these challenges. The proposed method integrates auxiliary inertial measurement unit (IMU) data to compensate for MoCap jitter. Additionally, a variable time synchronization method and a pose residual based on screw congruence constraints are proposed, enabling precise spatiotemporal calibration across multiple sensors and the DUT. Experimental results demonstrate that our approach outperforms existing methods, achieving the precision necessary for comprehensive benchmarking of state-of-the-art SLAM algorithms in XR applications. Furthermore, we thoroughly validate the practicality of our method by benchmarking several leading XR devices and open-source SLAM algorithms. The code is publicly available at https://github.com/ylab-xrpg/xr-hpgt.
- Information Technology > Human Computer Interaction > Interfaces > Virtual Reality (0.69)
- Information Technology > Artificial Intelligence > Vision > Video Understanding (0.48)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)
PPL: Point Cloud Supervised Proprioceptive Locomotion Reinforcement Learning for Legged Robots in Crawl Spaces
Ma, Bida, Xu, Nuo, Qi, Chenkun, Liu, Xin, Mo, Yule, Wang, Jinkai, Lu, Chunpeng
--Legged locomotion in constrained spaces (called crawl spaces) is challenging. In crawl spaces, current proprioceptive locomotion learning methods are difficult to achieve traverse because only ground features are inferred. In this study, a point cloud supervis ed RL framework for proprioceptive locomotion in crawl spaces is proposed . A state estimation network is designed to estimate the robot's collision states as well as ground and spatial features for locomotion . A point cloud feature extraction method is proposed to supervise the state estimation network . The method uses representation of the point cloud in polar coordinate frame and MLP s for efficient feature extracti on. Experiments demonstrate that, compared with existing methods, our method exhibits faster iteration time in the training and more agile locomotion in crawl spaces. This study enhances the ability of leg ged robots to traverse constrained spaces w ithout requiring exteroceptive sensors. N recent years, legged robots have demonstrated remarkable terrain traversal capabilities, exhibiting significant application value.
CSMapping: Scalable Crowdsourced Semantic Mapping and Topology Inference for Autonomous Driving
Qiao, Zhijian, Yu, Zehuan, Li, Tong, Chou, Chih-Chung, Ding, Wenchao, Shen, Shaojie
Crowdsourcing enables scalable autonomous driving map construction, but low-cost sensor noise hinders quality from improving with data volume. We propose CSMapping, a system that produces accurate semantic maps and topological road centerlines whose quality consistently increases with more crowdsourced data. For semantic mapping, we train a latent diffusion model on HD maps (optionally conditioned on SD maps) to learn a generative prior of real-world map structure, without requiring paired crowdsourced/HD-map supervision. This prior is incorporated via constrained MAP optimization in latent space, ensuring robustness to severe noise and plausible completion in unobserved areas. Initialization uses a robust vectorized mapping module followed by diffusion inversion; optimization employs efficient Gaussian-basis reparameterization, projected gradient descent zobracket multi-start, and latent-space factor-graph for global consistency. For topological mapping, we apply confidence-weighted k-medoids clustering and kinematic refinement to trajectories, yielding smooth, human-like centerlines robust to trajectory variation. Experiments on nuScenes, Argoverse 2, and a large proprietary dataset achieve state-of-the-art semantic and topological mapping performance, with thorough ablation and scalability studies.
- Asia > China > Hong Kong (0.04)
- North America > United States (0.04)
- Europe > Switzerland > Neuchâtel > Neuchâtel (0.04)
- (4 more...)
- Transportation > Ground > Road (1.00)
- Automobiles & Trucks (0.84)
- Information Technology > Robotics & Automation (0.70)
A Cross-Embodiment Gripper Benchmark for Rigid-Object Manipulation in Aerial and Industrial Robotics
Vagas, Marek, Varga, Martin, Romancik, Jaroslav, Majercak, Ondrej, Suarez, Alejandro, Ollero, Anibal, Vanderborght, Bram, Virgala, Ivan
Abstract--Robotic grippers are increasingly deployed across industrial, collaborative, and aerial platforms, where each embodiment imposes distinct mechanical, energetic, and operational constraints. Established YCB and NIST benchmarks quantify grasp success, force, or timing on a single platform, but do not evaluate cross-embodiment transferability or energy-aware performance, capabilities essential for modern mobile and aerial manipulation. This letter introduces the Cross-Embodiment Gripper Benchmark (CEGB), a compact and reproducible benchmarking suite extending YCB and selected NIST metrics with three additional components: a transfer-time benchmark measuring the practical effort required to exchange embodiments, an energy-consumption benchmark evaluating grasping and holding efficiency, and an intent-specific ideal payload assessment reflecting design-dependent operational capability. T ogether, these metrics characterize both grasp performance and the suitability of reusing a single gripper across heterogeneous robotic systems. A lightweight self-locking gripper prototype is implemented as a reference case. Experiments demonstrate rapid embodiment transfer (median 17.6 s across user groups), low holding energy for gripper prototype ( 1.5 J per 10 s), and consistent grasp performance with cycle times of 3.2-3.9 CEGB thus provides a reproducible foundation for cross-platform, energy-aware evaluation of grippers in aerial and manipulators domains. Robotic grasping has been extensively investigated across industrial, collaborative, and aerial domains.
- Europe > Slovakia > Košice > Košice (0.04)
- North America > Greenland (0.04)
- North America > Costa Rica > Heredia Province > Heredia (0.04)
- (2 more...)
Enhancing Kinematic Performances of Soft Continuum Robots for Magnetic Actuation
Wu, Zhiwei, Luo, Jiahao, Wei, Siyi, Zhang, Jinhui
--Soft continuum robots achieve complex deformation through elastic equilibrium, making their reachable motions governed jointly by structural design and actuation-induced mechanics. This work develops a general formulation that integrates equilibrium computation with kinematic performances by evaluating Riemannian Jacobian spectra on the equilibrium manifold shaped by internal/external loading. The resulting framework yields a global performance functional that directly links structural parameters, actuation inputs, and the induced configuration space geometry. We apply this general framework to magnetic actuation. Analytical characterization is obtained under weak uniform fields, revealing optimal placement and orientation of the embedded magnet with invariant scale properties. T o address nonlinear deformation and spatially varying fields, a two-level optimization algorithm is developed that alternates between energy based equilibrium search and gradient based structural updates. Simulations and physical experiments across uniform field, dipole field, and multi-magnet configurations demonstrate consistent structural tendencies: aligned moments favor distal or mid-distal solutions through constructive torque amplification, whereas opposing moments compress optimal designs toward proximal regions due to intrinsic cancellation zones. OFT continuum robots have gained growing attention for tasks involving compliant interaction, dexterous access, and safe manipulation in complex or confined environments. Their ability to realize smooth, multi-segment deformation without rigid joints supports applications in minimally invasive navigation, inspection, and human-centered tasks.
- Asia > China > Beijing > Beijing (0.05)
- Asia > China > Tianjin Province > Tianjin (0.04)
- Asia > China > Guangdong Province > Guangzhou (0.04)
- Africa > Cameroon > Gulf of Guinea (0.04)
LiHRA: A LiDAR-Based HRI Dataset for Automated Risk Monitoring Methods
Plahl, Frederik, Katranis, Georgios, Mamaev, Ilshat, Morozov, Andrey
We present LiHRA, a novel dataset designed to facilitate the development of automated, learning-based, or classical risk monitoring (RM) methods for Human-Robot Interaction (HRI) scenarios. The growing prevalence of collaborative robots in industrial environments has increased the need for reliable safety systems. However, the lack of high-quality datasets that capture realistic human-robot interactions, including potentially dangerous events, slows development. LiHRA addresses this challenge by providing a comprehensive, multi-modal dataset combining 3D LiDAR point clouds, human body keypoints, and robot joint states, capturing the complete spatial and dynamic context of human-robot collaboration. This combination of modalities allows for precise tracking of human movement, robot actions, and environmental conditions, enabling accurate RM during collaborative tasks. The LiHRA dataset covers six representative HRI scenarios involving collaborative and coexistent tasks, object handovers, and surface polishing, with safe and hazardous versions of each scenario. In total, the data set includes 4,431 labeled point clouds recorded at 10 Hz, providing a rich resource for training and benchmarking classical and AI-driven RM algorithms. Finally, to demonstrate LiHRA's utility, we introduce an RM method that quantifies the risk level in each scenario over time. This method leverages contextual information, including robot states and the dynamic model of the robot. With its combination of high-resolution LiDAR data, precise human tracking, robot state data, and realistic collision events, LiHRA offers an essential foundation for future research into real-time RM and adaptive safety strategies in human-robot workspaces.
- Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.05)
- North America > United States > Colorado > Boulder County > Boulder (0.04)
- Europe > Switzerland (0.04)
- (2 more...)